NSF PAR Search | NSF Public Access Repository

Note: When clicking on a Digital Object Identifier (DOI) number, you will be taken to an external site maintained by the publisher. Some full text articles may not yet be available without a charge during the embargo (administrative interval).
What is a DOI Number?

Some links on this page may take you to non-federal websites. Their policies may differ from this site.

Emerging Microelectronic Materials by Design: Navigating Combinatorial Design Space with Scarce and Dispersed Data

https://doi.org/10.1021/accountsmr.5c00011

Zhang, Hengrui; Georgescu, Alexandru B; Yerramilli, Suraj; Karpovich, Christopher; Apley, Daniel W; Olivetti, Elsa A; Rondinelli, James M; Chen, Wei (June 2025, Accounts of Materials Research)

Free, publicly-accessible full text available June 27, 2026
PLAUSIBLE INFERENCE WITH A PLAUSIBLE LIPSCHITZ CONSTANT

Keslin, Gregory; Apley, Daniel W; Nelson, Barry L (December 2024, Proceedings of the 2024 Winter Simulation Conference)
Lam, H; Azar, E; Batur, D; Gao, S; Xie, W; Hunter, SR; Rossetti, MD (Ed.)
Plausible inference is a growing body of literature that treats stochastic simulation as a gray box when structural properties of the simulation output performance measures as a function of design, decision or contextual variables are known. Plausible inference exploits these properties to allow the outputs from values of decision variables that have been simulated to provide inference about output performance measures at values of decision variables that have not been simulated; statements about the possible optimality or feasibility are examples. Lipschitz continuity is a structural property of many simulation problems. Unfortunately, the all-important—and essential for plausible inference—Lipschitz constant is rarely known. In this paper we show how to obtain plausible inference with an estimated Lipschitz constant that is also derived by plausible inference reasoning, as well as how to create the experiment design to simulate.
more » « less
Full Text Available
Automated crystal system identification from electron diffraction patterns using multiview opinion fusion machine learning

https://doi.org/10.1073/pnas.2309240120

Chen, Jie; Zhang, Hengrui; Wahl, Carolin B; Liu, Wei; Mirkin, Chad A; Dravid, Vinayak P; Apley, Daniel W; Chen, Wei (November 2023, Proceedings of the National Academy of Sciences)

A bottleneck in high-throughput nanomaterials discovery is the pace at which new materials can be structurally characterized. Although current machine learning (ML) methods show promise for the automated processing of electron diffraction patterns (DPs), they fail in high-throughput experiments where DPs are collected from crystals with random orientations. Inspired by the human decision-making process, a framework for automated crystal system classification from DPs with arbitrary orientations was developed. A convolutional neural network was trained using evidential deep learning, and the predictive uncertainties were quantified and leveraged to fuse multiview predictions. Using vector map representations of DPs, the framework achieves a testing accuracy of 0.94 in the examples considered, is robust to noise, and retains remarkable accuracy using experimental data. This work highlights the ability of ML to be used to accelerate experimental high-throughput materials data analytics.
more » « less
Full Text Available
Uncertainty-aware mixed-variable machine learning for materials design

https://doi.org/10.1038/s41598-022-23431-2

Zhang, Hengrui; Chen, Wei; Iyer, Akshay; Apley, Daniel W.; Chen, Wei (December 2022, Scientific Reports)

Abstract Data-driven design shows the promise of accelerating materials discovery but is challenging due to the prohibitive cost of searching the vast design space of chemistry, structure, and synthesis methods. Bayesian optimization (BO) employs uncertainty-aware machine learning models to select promising designs to evaluate, hence reducing the cost. However, BO with mixed numerical and categorical variables, which is of particular interest in materials design, has not been well studied. In this work, we survey frequentist and Bayesian approaches to uncertainty quantification of machine learning with mixed variables. We then conduct a systematic comparative study of their performances in BO using a popular representative model from each group, the random forest-based Lolo model (frequentist) and the latent variable Gaussian process model (Bayesian). We examine the efficacy of the two models in the optimization of mathematical functions, as well as properties of structural and functional materials, where we observe performance differences as related to problem dimensionality and complexity. By investigating the machine learning models’ predictive and uncertainty estimation capabilities, we provide interpretations of the observed performance differences. Our results provide practical guidance on choosing between frequentist and Bayesian uncertainty-aware machine learning models for mixed-variable BO in materials design.
more » « less
Full Text Available
Scalable Gaussian Processes for Data-Driven Design Using Big Data With Categorical Factors

https://doi.org/10.1115/1.4052221

Wang, Liwei; Yerramilli, Suraj; Iyer, Akshay; Apley, Daniel; Zhu, Ping; Chen, Wei (February 2022, Journal of Mechanical Design)
null (Ed.)
Abstract Scientific and engineering problems often require the use of artificial intelligence to aid understanding and the search for promising designs. While Gaussian processes (GP) stand out as easy-to-use and interpretable learners, they have difficulties in accommodating big data sets, categorical inputs, and multiple responses, which has become a common challenge for a growing number of data-driven design applications. In this paper, we propose a GP model that utilizes latent variables and functions obtained through variational inference to address the aforementioned challenges simultaneously. The method is built upon the latent-variable Gaussian process (LVGP) model where categorical factors are mapped into a continuous latent space to enable GP modeling of mixed-variable data sets. By extending variational inference to LVGP models, the large training data set is replaced by a small set of inducing points to address the scalability issue. Output response vectors are represented by a linear combination of independent latent functions, forming a flexible kernel structure to handle multiple responses that might have distinct behaviors. Comparative studies demonstrate that the proposed method scales well for large data sets with over 104 data points, while outperforming state-of-the-art machine learning methods without requiring much hyperparameter tuning. In addition, an interpretable latent space is obtained to draw insights into the effect of categorical factors, such as those associated with “building blocks” of architectures and element choices in metamaterial and materials design. Our approach is demonstrated for machine learning of ternary oxide materials and topology optimization of a multiscale compliant mechanism with aperiodic microstructures and multiple materials.
more » « less
Full Text Available
Bias-corrected Estimation of the Density of a Conditional Expectation in Nested Simulation Problems

https://doi.org/10.1145/3462201

Yang, Ran; Kent, David; Apley, Daniel W.; Staum, Jeremy; Ruppert, David (October 2021, ACM Transactions on Modeling and Computer Simulation)

Many two-level nested simulation applications involve the conditional expectation of some response variable, where the expected response is the quantity of interest, and the expectation is with respect to the inner-level random variables, conditioned on the outer-level random variables. The latter typically represent random risk factors, and risk can be quantified by estimating the probability density function (pdf) or cumulative distribution function (cdf) of the conditional expectation. Much prior work has considered a naïve estimator that uses the empirical distribution of the sample averages across the inner-level replicates. This results in a biased estimator, because the distribution of the sample averages is over-dispersed relative to the distribution of the conditional expectation when the number of inner-level replicates is finite. Whereas most prior work has focused on allocating the numbers of outer- and inner-level replicates to balance the bias/variance tradeoff, we develop a bias-corrected pdf estimator. Our approach is based on the concept of density deconvolution, which is widely used to estimate densities with noisy observations but has not previously been considered for nested simulation problems. For a fixed computational budget, the bias-corrected deconvolution estimator allows more outer-level and fewer inner-level replicates to be used, which substantially improves the efficiency of the nested simulation.
more » « less
Full Text Available
Bayesian Optimization for Materials Design with Mixed Quantitative and Qualitative Variables

https://doi.org/10.1038/s41598-020-60652-9

Zhang, Yichi; Apley, Daniel W.; Chen, Wei (December 2020, Scientific Reports)

Full Text Available
Scalable Adaptive Batch Sampling in Simulation-Based Design With Heteroscedastic Noise

https://doi.org/10.1115/1.4049134

van Beek, Anton; Ghumman, Umar Farooq; Munshi, Joydeep; Tao, Siyu; Chien, TeYu; Balasubramanian, Ganesh; Plumlee, Matthew; Apley, Daniel; Chen, Wei (March 2021, Journal of Mechanical Design)
null (Ed.)
Abstract In this study, we propose a scalable batch sampling scheme for optimization of simulation models with spatially varying noise. The proposed scheme has two primary advantages: (i) reduced simulation cost by recommending batches of samples at carefully selected spatial locations and (ii) improved scalability by actively considering replicating at previously observed sampling locations. Replication improves the scalability of the proposed sampling scheme as the computational cost of adaptive sampling schemes grow cubicly with the number of unique sampling locations. Our main consideration for the allocation of computational resources is the minimization of the uncertainty in the optimal design. We analytically derive the relationship between the “exploration versus replication decision” and the posterior variance of the spatial random process used to approximate the simulation model’s mean response. Leveraging this reformulation in a novel objective-driven adaptive sampling scheme, we show that we can identify batches of samples that minimize the prediction uncertainty only in the regions of the design space expected to contain the global optimum. Finally, the proposed sampling scheme adopts a modified preposterior analysis that uses a zeroth-order interpolation of the spatially varying simulation noise to identify sampling batches. Through the optimization of three numerical test functions and one engineering problem, we demonstrate (i) the efficacy and of the proposed sampling scheme to deal with a wide array of stochastic functions, (ii) the superior performance of the proposed method on all test functions compared to existing methods, (iii) the empirical validity of using a zeroth-order approximation for the allocation of sampling batches, and (iv) its applicability to molecular dynamics simulations by optimizing the performance of an organic photovoltaic cell as a function of its processing settings.
more » « less
Full Text Available
Scalable Objective-Driven Batch Sampling in Simulation-Based Design for Models With Heteroscedastic Noise

https://doi.org/10.1115/DETC2020-22629

van Beek, Anton; Farooq Ghumman, Umar; Munshi, Joydeep; Tao, Siyu; Chien, TeYu; Balasubramanian, Ganesh; Plumlee, Matthew; Apley, Daniel; Chen, Wei (November 2020, Scalable Objective-Driven Batch Sampling in Simulation-Based Design for Models With Heteroscedastic Noise)
null (Ed.)
Abstract Objective-driven adaptive sampling is a widely used tool for the optimization of deterministic black-box functions. However, the optimization of stochastic simulation models as found in the engineering, biological, and social sciences is still an elusive task. In this work, we propose a scalable adaptive batch sampling scheme for the optimization of stochastic simulation models with input-dependent noise. The developed algorithm has two primary advantages: (i) by recommending sampling batches, the designer can benefit from parallel computing capabilities, and (ii) by replicating of previously observed sampling locations the method can be scaled to higher-dimensional and more noisy functions. Replication improves numerical tractability as the computational cost of Bayesian optimization methods is known to grow cubicly with the number of unique sampling locations. Deciding when to replicate and when to explore depends on what alternative minimizes the posterior prediction accuracy at and around the spatial locations expected to contain the global optimum. The algorithm explores a new sampling location to reduce the interpolation uncertainty and replicates to improve the accuracy of the mean prediction at a single sampling location. Through the application of the proposed sampling scheme to two numerical test functions and one real engineering problem, we show that we can reliably and efficiently find the global optimum of stochastic simulation models with input-dependent noise.
more » « less
Full Text Available
A Latent Variable Approach to Gaussian Process Modeling with Qualitative and Quantitative Factors

https://doi.org/10.1080/00401706.2019.1638834

Zhang, Yichi; Tao, Siyu; Chen, Wei; Apley, Daniel W. (July 2019, Technometrics)

Full Text Available

« Prev Next »

Search for: All records